Noise-robust hands-free voice activity detection with adaptive zero crossing detection using talker direction estimation
نویسندگان
چکیده
This paper proposes a novel hands-free voice activity detection (VAD) method utilizing not only temporal features but also spatial features, called adaptive zero crossing detection (AZCD), that uses talker direction estimation. It firstly estimates talker direction to extract two spatial features: spatial reliability and spatial variance, based on weighted cross-power spectrum phase analysis and maximum likelihood estimation. Then, the AZCD detects voice activity frames by robustly detecting zero crossing information of speech with adaptively controlled thresholds using the extracted spatial features in noisy environments. The experimental results in an actual office room confirmed that the VAD performance of the proposed method that utilizes both temporal and spatial features is superior to that of the conventional method that utilizes only the temporal or spatial features.
منابع مشابه
Noise robust voice activity detection using features extracted from the time-domain autocorrelation function
This paper presents a method of voice activity detection (VAD) for high noise scenarios, using a noise robust voiced speech detection feature. The developed method is based on the fusion of two systems. The first system utilises the maximum peak of the normalised time-domain autocorrelation function (MaxPeak). The second system uses a novel combination of cross-correlation and zero-crossing rat...
متن کاملVoice Activity Detection based on Optima Multiple Featu
This paper presents a voice activity detection (VAD) scheme that is robust against noise, based on an optimally weighted combination of features. The scheme uses a weighted combination of four conventional VAD features: amplitude level, zero crossing rate, spectral information, and Gaussian mixture model likelihood. This combination in effect selects the optimal method depending on the noise co...
متن کاملEvaluation of voice activity detection by combining multiple features with weight adaptation
For noise-robust automatic speech recognition (ASR), we propose a novel voice activity detection (VAD) method based on a combination of multiple features. The scheme uses a weighted combination of four conventional VAD features: amplitude level, zero crossing rate, spectral information, and Gaussian mixture model (GMM) likelihood. The weights for combination are adaptively updated using minimum...
متن کاملPerformance Analysis of Voice Activity Detection Algorithms for Robust Speech Recognition
The emerging applications of speech technology especially in the fields of wireless applications, digital hearing aids or speech recognition are often requiring a noise reduction technique in combination with a precise Voice Activity Detector (VAD). In this paper, we compare the performance of the VAD algorithms like Zero Crossing Detection(ZCD), Weak Fricative Detection (WFD), Pitch Based Dete...
متن کاملTurn taking-based conversation detection by using DOA estimation
We propose a new method that detects conversation groups when multi-conversation groups exist simultaneously. The proposed method uses hands-free microphone arrays without wearable microphones. It has two main features: (a) We integrate a conventional turn taking-based conversation detection method with Direction of Arrival (DOA) estimation-based Voice Activity Detection (VAD). (b) The proposed...
متن کامل